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Cumulative Search-Evasion Games (CSEGs) 

James N. Eagle Alan R. Washburn 
Department of Operations Research 
Naval Postgraduate School 
Monterey, CA 93943 

CSEGs are search-evasion games where play proceeds throughout some specified period without 
any interim feedback to either of the two players, each of whom is assumed to move according to 
some preselected plan. If (AVV*) are the positions of the two players at time t. then the payoff is 

j' 

N — A(A^,lt, /). That is, the payoff is a cumulative score over the time intervals 1, . . . , 7\ 

One possibility is that A( X t ,Y t ,i) is an indicator of the event A”* = Y t , in which case N is the 
total number of coincidences. This is the definition that motivated the class, but it is not the only 
possibility. 

Both players are assumed to move among some finite set of cells C. Initial positions in C 
are determined by probability distributions which are known to both players. One possibility is 
that the distributions correspond to specific starting cells. This will be the case in a subsequent 
example, but. again, it is not required. 

Given his initial position A"i and the assumed distribution for player 2’s initial cell, player 1 

must select a feasible track A”i, A^ Xj. A track is feasible if A^+i lies in the given set S(X t J) 

for 1 < t < T — 1. These tracks are player 1 ’s pure strategies. Likewise Yt+i must lie in the given 
set E(Y t .t) for 1 < t < T — 1. Generally S(iJ) and E(i,t) include cell i and some of its neighbors, 
the idea being that feasible tracks should connect neighboring cells. The payoff is determined once 
the tracks are selected by both players. Player 1 attempts to maximize the expected payoff. E[.Y], 
and player 2 to minimize. Given the interpretation of the problem, it is natural to expect optimal 
strategies for both sides to be mixed. 

1. Discussion and Motivation 

One might prefer to consider a similar class of games where the pure strategy payoff is 1 -e _A , since 
that quantity can be interpreted as a detection probability if A(x,yJ) is ‘‘detection rate at time t” 
(Koopman (1980)). Alternatively one might let the payoff be “time to the first detection,” as in 
Ruckle’s (1983) Pursuit on a Graph game. Such detection games are of considerable operational 
interest. Single player versions where player 2's motion is according to a specified Markov process 
have been considered by Stewart (1979,1980), Eagle (1984), and Trummel and Weisinger (1986). 
and there is a more extensive literature (Stone (1989)) if the searcher's path is not constrained. It 
would indeed be satisfying to find an efficient method for solving the corresponding detection games 
where the evader's path is not probabilistically specified, and where he can thus more completely 
live up to his title. Unfortunately, the methods to be introduced later are tailored to the payoff A . 



rather than 1 — r“" A . Of course 1 — e- A is approximately equal to N when N is small, so CSEGs can 
be regarded as first order approximations to detection games. The scale of is immaterial 

in solving a CSEG, but the validity of the approximation to a detection game will be best when 
A(-, •, •) is small. 

Direct motivation of CSEGs is also possible. There are a variety of reasons why the results of 
search might not be known until it is over. Photographic film might have to be developed or nets 
hauled in. Another possible application is search planning for autonomous vehicles; for example, 
an over-the-horizon unmanned aircraft whose track must be specified before launch. Also, there is 
no real reason in CSEGs for restricting the two sides to consist of a single agent each. The two 
sides might be teams or even armies, one seeking contact and the other desirous of avoiding it. 
The “no feedback” restriction might then be viewed as a natural consequence of the difficulties of 
communication in the field. 

Although the payoff in a CSEG has the same form as in a Multi-Stage Game (Thomas(1984)), 
CSEGs are not MSGs. To make an MSG out of a CSEG one would have to reveal the position 
of each player to the other after each move, so that the joint position could serve as a “state.” 
Although such games are interesting, they are not what we have in mind here. 

2 . Initial Observations 

CSEGs are finite, two-person zero-sum games, so solutions certainly exist. The straightforward way 
to proceed would be to list all feasible tracks and then use linear programming to find the optimal 
probabilities for each track. The difficulty with this is that the number of feasible tracks explodes 
rapidly with the size of the problem. If the sets S(i. t) all have three elements, and if the initial 
distribution for player l’s position is also concentrated on three points, there are 3 T pure strategies 
for player 1. This kind of exponential growth makes the “brute force” approach impractical for 
even moderately sized problems. The object must be to take advantage of the special structure of 
CSEGs to develop more efficient methods. 

A mixed strategy for either player is a discrete probability distribution over the possible feasible 
tracks. Given mixed strategies for players 1 and 2, let p(i,t) be the marginal probability that 
player 1 visits cell i at time t. Likewise let be the corresponding probability for player 2. 

Then since the expectation of a sum is the sum of expectations, and since the two players choose 
their strategies independently, 

T 

E[N] = ^ E A(iJ,t)p(U)q(j,t). 

t = 1 1 jec 

This payoff depends only on the marginal distributions p(-, •) and <?(-, •), so there is the possibility of 
an analysis based directly on them, rather than on the mixed strategies themselves. Furthermore, 
when p(*,*) is given, player 2’s problem in selecting an optimal track is a T-period shortest path 



problem, a relatively simple type that can be solved quickly even for large problems. To see this, let 
D(j,t) = Y,iec t) be the penalty associated with visiting cell j at time t. Then player 2 

wants to find a feasible track that visits the cells in such a manner as to minimize the sum of all 
T such penalties, a shortest path problem that can easily be solved using dynamic programming. 
Given a mixed strategy for player 1, this shortest path solution gives a lower bound on the value 
of the game. Similar comments hold concerning player l’s selection of a track when </(-. •) is given. 
The fact that a lower bound on the value of the game is determined by specifying p(-, •) and solving 
a shortest path problem, and that an upper bound is found by specifying q( •, •) and solving a longest 
path problem will prove invaluable in the techniques to be discussed in the following sections. 

CSEGs often have a “turnpike” property (Whittle (19S3)) in the sense that optimal marginal 
distributions are attracted to a certain equilibrium pair (p*(*. •), *)). More precisely, let v(t) be 

the value of the one-period matrix game A(-,-,t), and let and < 7 *(-,/) be optimal mixed 

strategies for the two players, unrestricted except that each must be a discrete probability distri 
bution over the cells in C. If p*(\t) and </*(•,/) are feasible marginal distributions for each time 
period of a T-period CSEG, then they must also be optimal. Furthermore, the value of the CSEG 
is ^2 t -i t’(/). In general the feasibility requirement will fail because p(\t) and q(',t) are required 
by the path constraints to resemble the initial distributions for small values of t. However, we can 
say 



Theorem 1. Suppose /?(•, •) and r/(-, •) are optimal for the T\ -period CSEG , suppose To > T \ , and 
let 



(?)(• t) q(. t))= /(K-.0-?(-,0)/or#<r, 

(P[ ' >' q( ’ ’’ \ for 7i < / < T 2 . 



If p(-,-) and <?(v) ore feasible for the To-period CSEG , then they are also optimal. 



Proof: Let E[N(T)] be the expected payoff and \ (T) be the value of the T-period CSEG. Since 
p(*,-) is optimal for the T]-period game. T[A T (Ti)] > V(T\) when player 1 uses p(-.-) and player 2 
uses any feasible mixed strategy. Since /)(*, •) agrees with /;(•, •) for t < T\ . the same can be said of 
p(-,-). Therefore if player 1 uses p(\'), 

x 2 

E[N(T 2 )]>V(T 1 )+ Y, l ’(0- 

<=Ti + l 



Likewise if player 2 uses </(•,*). 

t 2 

E[N(T 2 )}<V(Ti)+ Y v (*)- 

t=Ti + l 

The theorem follows. Furthermore, the value of the To-period CSEG is 

V(T 2 ) = V(T,)+ Y r (0-l 

t=7 1 + 1 



If (i) /!(♦,•,/) does not actually depend on /, then neither will p x (-,t) nor Additionally, 

if (ii) the path constraints allow both players to remain stationary, then these two “equilibrium” 
distributions will be feasible at t + 1 if they are feasible at t. Finally, if (i) and (ii) hold plus 
and <?*(•, •) are feasible at time t , then p *(•,*) aa d 9*(v) are feasible and optimal marginal 
distributions from t onward. Solving the CSEG can then be viewed as programming the two sides to 
move from the given initial position distributions to equilibrium distributions. Only the transient 
phase presents any computational difficulty; once the equilibrium distributions are encountered, 
they are feasible and optimal from that point on. We now turn to methods for solving specific 
CSEGs. 



3. The Brown-Robinson Method 



In Robinson (1951), the method of fictitious play was shown to iteratively solve two-person zero-sum 
matrix games. This procedure had been suggested earlier by G. W. Brown. To describe fictitious 
play, let player 1 be the row (maximizing) player and player 2 be the column (minimizing) player. 
Rows and columns correspond to the pure strategies (tracks) described earlier. If player 1 selects 
row i and player 2 selects column j, then reward is paid from player 2 to player 1. In each 
fictitious play of the game (except the first), the players select the best pure strategy response to 
the empirical mix of the opponent’s pure strategies observed so far. So at play k > 2, player 1 
chooses the pure strategy Xk (a vector where every component but one is 0) that is a best response 
to 

1 | 

Vk = y Jl yt = ?k - 1 + -Vk - it 

where y t is the pure strategy played by player 2 at time t. Then player 2 chooses the pure strategy 
yk, which is the best response to the updated row average 



x k = 




x k - 1 + ~p( X k 



x k~ 1). 



Any limit points of the sequences and {y^ are solutions to the game. Also upper and 

lower bounds on the value of the game, n, are determined at each game play. Specifically, at game 
play k\ 

V k = (x k ) 1 Ay k < v < (. x k yAy k = v k , 

and both v k and Vk converge to v. Fictitious play begins with the players selecting arbitrary 
strategies (pure or mixed) x\ = xq and y\ = y x . 

We note that to solve a matrix game by fictitious play, each player need only be able to select 
a best pure strategy response to any mixed strategy and evaluate the expected return. For CSEGs. 
this means that for fictitious play number k > 2, player 1 must be able to first update the running 



average of the previously observed k — 1 pure strategies played by player 2, and then solve the T 
period longest path problem giving the best pure strategy response for player 1. Similarly, player 2 
must be able to update the running average of the previously observed k pure strategies played 
by player 1, and then solve the shortest path problem giving the best pure strategy response for 
player 2. The procedure begins with both players selecting arbitrary T-period strategies. 

The Brown-Robinson method is notorious for converging very slowly to the optimal solution. 
However the simplicity of the updating procedure, which allows solution of moderate sized problems 
on microcomputers, makes it appealing for CSEGs. 

4. The Linear Programming (LP) Method 

It has been mentioned that CSEGs could conceivably be solved with LP methods if all pure strate- 
gies are enumerated. In this section an LP formulation is presented which does not require this 
explicit enumeration yet, unlike fictitious play, solves the game exactly. 

To set up the LP, first let g(jJ) be the smallest possible payoff accumulated over periods 
t,t -f 1,...,7\ given that player 2 starts in cell j at time t and that player Us mixed strategy has 
marginals p(«, •). Then 



»ec 



min o(k. 1 + 1 ). 
keEU,t) 



( 1 ) 



Since player 2’s location at time 1 is specified by the distribution (/(♦), player Us object is to 
maximize £[A r ] = tfO’MM)- 

The feasibility (i.e., path) constraints are incorporated by introducing n(ijj) as the proba- 
bility that player 1 visits cell i at time t and cell j at time / + 1. Then the marginal variables p(-. •) 
can be dispensed with because 



p(»\0= ^ i £ C,t = 1 T- 1; (2) 

or alternatively. 

p(M) = ^ u(j.i, t — 1): i £ Cj = 2 T . (3) 

Here S*(i,t ) = {j 6 C\i G S(jJ - 1)} for i in C and t = 2, . . . ,T is “the set of cells player 1 might 
have come from.’' This is distinguished from which is “the set of cells to which player 1 

might go.” As long as the right hand sides of (2) and (3) are equal, the common value is a feasible 
marginal distribution for player 1. Using only the n(-,*,-), and p(-,T) variables, player Us 

problem is the following LP (the indicated dual variables will later be associated with player 2’s 
LP): 



maximize 



tec 



subject to: 

u(i, k, 1) = p(i); ieC 

fc6S(t,l) 

- 2Z + 22 u (i,k,t) = 0] i £ C, t = 2, . . . ,T — 1 

jes*(i,f) t€S(i,() 

-22 !i(j,i,T-l) + p(i,'T) = 0; ieC 

jes*(t,T) 

- ^ A(i,k,T)p(k,T ) + <7(i,T) < 0; t € C 

hec 

— 22 22 U (*’M) — p(A%t + l) + p(j,0 < 0; J e C,k £ E(k,t), 

iec ies(i,t) t = 1, . . . ,T — 1 



dual variables 




h(i,l) 


( 4 ) 


h(i , t) 


( 5 ) 


h (i,T) 


(6) 


9(i,T) 


( 7 ) 


v(j,k,t ) 


(8) 



u(i,jj) >0; ij £ C, t = 1,. .. ,T - 1 
p(i,T) >0; i £ C 



Constraints (4) enforce the starting condition p(i, 1) = p(i); constraints (5) enforce the equality of 
(2) and (3): constraints (6) and (7) are the appropriate terminal conditions for p(i,T) and g(i.T ); 
and constraints (8) are implied by (1). A proof that (8) and (1) are actually equivalent, and that the 
solution of the LP is therefore the solution of the game, could be based on an inductive argument 
that the objective function cannot be maximal unless at least one (8)-type constraint is tight for 
each However, it is simpler to merely observe that the solution of this LP is in any case a 

lower bound on the value of the game, and to conclude equality from the fact that the dual of this 
LP is the corresponding minimization problem for player 2. 

This duality relationship will also allow us to identify the optimal solution for one player from 
the optimal dual variables in his opponent's LP. To see this, let v(i,j,t) be player 2's counterpart 
to and let h(i,t) be the maximum obtainable expected total reward when player 1 starts 

in cell i at time t and player 2 uses Then the problem player 2 must solve, which is the 

dual of player l’s LP, is 



minimize 

tec 



subject to: 

v ( i ^ k - 1 ) ~ 7(0? * £ C 

*€£(«, 1 ) 

«ci, *, / - l ) + £ v(/,*,0 = 0; i , eC,/ = 2 r i 

v Ua^ T ~ U + ?U T ) = 0; i € C 

j^E"(i,T) 

-J^A(i,k.T)q{k,T) + h(i,T)>0\ i <= C 

kec 

-^A(iJ.t) v(jj,t) ~ h(k.t + 1 ) + h(i,t) > 0 : i e C,k e S(i,t), 

j€C leEu.t) t- i J_1 

v(i,jj) > 0; ij E C. / — 1 T - 1 

</(i, T) > 0; i E C 



dual variables 




9(h lj 


(9) 


g(i-t) 


(10) 


g( kT ) 


(11) 


P(kT) 


(12) 


u(kk.t) 


(13) 



Player l’s LP can be made smaller by using (6) to solve for p(?T) and then substituting into 
(7). This eliminates constraints (6) and variables p(i % T). Likewise constraints (11) and variables 
q(i,T) can be eliminated from player 2's LP. After these simplifications, the number of variables 
in player l*s LP is the number of nodes plus the number of arcs in the T-period network specified 
by constraints (4) and (5). Similarly, the number of variables in player 2*s smaller problem is 
the number of nodes plus arcs defined by constraints (9) and (10). Furthermore, the number of 
constraints in one player's LP is equal to the number of variables in his opponent's problem. So 
for both players, the number of variables and constraints expands linearly with T rather than 
exponentially. Thus for other than very small problems, solving these LPs is less burdensome than 
the “brute force” LP procedure mentioned earlier. 

When compared to fictitious play, the LP procedure's primary advantage is that exact answers 
are produced. One would expect to resort to fictitious play only when the LP problem size exceeds 
the capability of available LP solvers. 



5. The One-Dimensional Game 

Consider a CSEG where 2 n cells (n > 1) are arranged linearly with the searcher (player 1) initially 
in cell 1 and the evader (player 2) initially in cell 2n. Transitions to neighboring cells are possible, 
or either party may remain stationary. Thus, except for end cells 1 and 2n. E(i.t) = S(i.t) = 
E*(iJ ) = = {i - 1 ,?,? + 1} for all /. The payoff at time t is 1 if searcher and evader are in 

the same cell, otherwise 0. The equilibrium distributions p*(-. /) and are easily demonstrated 



to be uniform, so for large T we expect the value of the T-period game to be v n (T) = T/2n — K n 
for some K n . Questions of interest are: 

• Is I\ n predictable, and what does “large T” mean? 

• What is the nature of the optimal strategies? 

One reasonable strategy for the evader is what we will call “spreading.” The idea is to achieve 
the equilibrium distribution as fast as possible, and while doing so to assure that every cell feasible 
for the searcher contains at most the equilibrium probability. Spreading is not feasible in every 
CSEG, but the evader has no trouble employing it in the game under consideration. Figure 1 shows 
a spreading strategy when there are four cells. 

Cell 

12 3 4 

1 

2 

Time 

3 

4 

5 

Figure 1. Evader “spreads” unit probability over 4 cells. Cells not feasible for 
the searcher are shaded. 

Since the searcher can obtain nothing on the first 2 opportunities and at most .25 per look on the 
third and subsequent opportunities, ih (T) < (T — 2) / 4 for T > 2. Therefore A” o > .5. In fact 
v n (T) < (T — n)/2n for T > n because evader spreading is feasible for any n, so K n > .5 for n > 1. 

Searcher spreading is also feasible here. Against searcher spreading the evader’s best strategy 
is to simply remain stationary, in which case there is no payoff for the first 2n — 1 time periods. 
Therefore v n (T) > (T — (2n - l))/2n for T > 2n — 1, and hence K n < (2 n - l)/2n. Thus 

2 77 — 1 

•5 < Kn < — s < 1. (14) 

in 

For all T, spreading is optimal for both sides when n — 1. It also turns out to be optimal for the 
evader when T = 2 n, a game that is of some interest because 2 n is the smallest value of T such that 
the solution is not trivial. To see this, note first that we have already established that v u (2n) < .5. 
The searcher can also guarantee a payoff of .5, but by “rushing” rather than spreading. In rushing, 
the searcher essentially charges from one end to the other at top speed, except that for all t such 
that 2 < t < 2 n he must be equally likely to occupy cells t and t — 1; the split is required to 
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prevent the possibility that the evader might pass by without coincidence. By rushing, the searcher 
guarantees that the probability of a coincidence somewhere in the first 2n periods is at least .5, so 
v n (2n) > .5. Therefore v n (2 n) - .5, since the opposite inequality has already been established. 

Obviously the searcher could continually rush from one end to the other, obtaining a payoff 
of .5 for every 2 n time periods. This is not attractive when T is large, however, since a uniform 
distribution will in the long run obtain a payoff of l/2n per time period. The searcher's dilemma 
is that rushing and spreading each have their attractions. Unfortunately the two strategios are 
incompatible in that rushing retains a concentrated distribution, wheres spreading aims for u nifor 
niity. This dilemma does not exist for the evader, since spreading is optimal for 7 = 2 n and also 
attractive in the long run. One might therefore expect that I\ n would be closer to .5 than to 1 
in (12). Tins turns out to be the case. Table 1 shows K n for 1 < n < 6 as established with lin- 
ear programming formulations generated by the General Algebraic Modeling System (GAMS) and 
solved with MINOS (Modular In-core Nonlinear Optimization System) on the NFS IBM 3033AP 
mainframe computer. 



n 


Kn 


T 


i 


.5000 


2 


2 


.5357 


6 


3 


.5431 


10 


4 


.5440 


13 


5 


.5459 


14 


6 


.5459 


19 



Table 1 . K n and T n for n = 1, .... 6. 

Additionally T n is listed, which is the first time both probability distributions become uniform. T n 
is remarkably close to 3n, but is not 3/? exactly. When n — 3. player 1 can only force a payoff of 
9/6 — .5433 at time 9 if uniformity at time 9 is forced. 

Figure 2 shows how the searcher's probability distribution />(•,•) evolves with time when there 
are 2n = 12 cells and T is 19 or larger. The first six time periods are not shown because p(i % i) = 1 for 
time t < 6; the searcher moves forward at top speed as long as contact with the evader is physically 
impossible. Equilibrium first appears at time 19. The searcher’s motion might reasonably be 
characterized as a compromise between rushing and spreading. 

Figure 3 shows the evader's probability distribution q { •.•). The highest probability is in cell 
12 at time 11, the last time at which the searcher is guaranteed not to be there. That probability 
(.306) is evenly divided between cells 11 and 12 at time 12, and then spreads out from there. The 
equilibrium distribution first appears at time 18. Note that the probability in low numbered cells 
goes through a maximum. This also happens with the searcher (p(1.17) = .0^99 > p(K19l = 
.0833). but much more weakly. 




Figure 2. Searcher’s Probability Distribution Evolving with Time. 
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Figure 3. I'vador’s Probabili (.y Distribution Involving with Tiino. 



6. A Two-Dimensional Example 



Now consider an 8-time period problem where a searcher and a evader move over a 5 X 5 grid of cells. 
The searcher begins in the upper left cell and the evader begins in the lower right cell. The searcher 
detects the evader with certainty if they share the same cell. Both players can move between cells 
in a single time period if the cells share a side or a corner. This problem has approximately 381,000 
pure strategies (i.e.; feasible paths) for each player. It can be solved with linear programming but 
is large enough to make the Brown-Robinson method attractive— especially if a microcomputer 
solution is desired. 

The Brown-Robinson procedure for this problem was programmed in Fortran 77 on a Mac- 
intosh IIx computer. After 40,000 fictitious plays, mixed strategies for both sides were generated 
which bounded the value of the game between .1845 and .1938. On this microcomputer, approxi- 
mately 5 fictitious plays per second were accomplished. Figure 4 indicates the rate of convergence 
of the bounds. 
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Figure 4. Bounds on the Value of the Game Generated by Fictitious Play. 



The same problem was solved exactly using linear programming. Required were 1383 variables 
and the same number of constraints. An optimal solution was obtained after 2385 pivots and used 
approximately 410 CPU seconds on the NPS mainframe. The value of the game is .1891. Optimal 
marginal distributions for the searcher and evader (xlOOO) are shown in Figures 5 and 6. 

Since any and ?;(•,•,•) will be optimal if they satisfy the path constraints and have 

optimal marginal distributions, it is reasonable to suspect that this problem might have many 
optimal solutions. This, in fact, is the case. Even the marginals are not unique. For example, 
any marginal distribution for the evader at time 2 is optimal if it “connects’' optimal marginals at 
times 1 and 3. Figures 5 and 6 show optimal solutions with diagonal symmetry, but this symmetry 
was forced for esthetic reasons by adding additional constraints. 

In this problem, the equilibrium distribution of .04 in each cell is reached at time 8 for both 
players. For the evader, this distribution is a feasible extension of his optimal marginal distribution 
at time 6. Were that true at time 6 for the searcher as well, then equilibrium would have been 
reached one time period earlier at time 7. Instead the evader concentrates his effort at time 7 
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Figure 5. Searcher’s Marginal Distribution (xlOOO). 
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Figure 6. Evader’s Marginal Distribution (xlOOO). 




in the upper left-hand corner, taking advantage of a low searcher marginal level there. For all 
times t > 8, the equilibrium distribution is optimal for both players, and the value of the game is 
.1891 + .04(t - 8). 
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